Skip to content

Conversation

misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Aug 28, 2025

PR Type

Enhancement, Bug fix


Description

  • Add pytest test splitting for parallel tracing

  • Launch multiple tracer subprocesses in parallel

  • Aggregate multiple replay test paths

  • Clean up all generated replay files


Diagram Walkthrough

flowchart LR
  A["CLI args with pytest"] -- "detect test paths" --> B["pytest_split()"]
  B -- "multiple splits" --> C["spawn Popen per split"]
  C -- "wait & collect" --> D["gather replay_test_paths"]
  D -- "optimize" --> E["codeflash --replay-test <paths>"]
  B -- "single/no split" --> F["single subprocess.run"]
  F -- "collect" --> D
Loading

File Walkthrough

Relevant files
Enhancement
tracer.py
Parallelize tracer execution for pytest runs                         

codeflash/tracer.py

  • Introduce pytest-aware splitting and parallel tracing.
  • Spawn multiple tracer subprocesses with split-specific args.
  • Collect multiple replay test paths and pass to optimizer.
  • Ensure cleanup of all replay and trace artifacts.
+89/-40 
pytest_parallelization.py
Add pytest test splitting utility                                               

codeflash/tracing/pytest_parallelization.py

  • Add pytest_split to parse pytest args and discover tests.
  • Randomize and split tests into balanced groups.
  • Support directories and files; handle edge cases.
  • Return split groups and original test paths.
+83/-0   

Signed-off-by: Saurabh Misra <[email protected]>
Copy link

github-actions bot commented Aug 28, 2025

PR Reviewer Guide 🔍

(Review updated until commit 060a78c)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

When building per-split output paths, the code assumes outfile is not None. If outfile is None, accessing parent/stem/suffix will raise. Validate outfile or guard this branch.

outpath = parsed_args.outfile
outpath = outpath.parent / f"{outpath.stem}_{i}{outpath.suffix}"
args_dict["output"] = str(outpath)
API Robustness

Using pytest.Parser().parse_known_args on arbitrary CLI args may not be stable across pytest versions and may not capture file_or_dir as expected. Consider a safer way to extract test paths (e.g., filter positional paths that exist) to avoid ImportError/version drift.

    import pytest

    parser = pytest.Parser()

    pytest_args = parser.parse_known_args(arguments)
    test_paths = getattr(pytest_args, "file_or_dir", None)
    if not test_paths:
        return None, None

except ImportError:
Return Type Consistency

The function returns different shapes: list[list[str]] in most cases, but a flat list[str] when splits are unnecessary. This complicates callers and can cause mis-detection of parallelism. Standardize to always return list[list[str]] and use len<=1 as the single-split case.

    return test_files, test_paths

num_splits = min(num_splits, max_possible_splits)

Copy link

github-actions bot commented Aug 28, 2025

PR Code Suggestions ✨

Latest suggestions up to 060a78c
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid shared dict mutation in parallel

Avoid reusing and mutating a single args_dict across parallel processes; this can
race and produce mismatched output/result_pickle_file_path values. Create a fresh
args per split and pass a pre-serialized JSON string to Popen. Also guard outpath
when outfile is None to prevent attribute errors.

codeflash/tracer.py [109-139]

 if len(pytest_splits) > 1:
     processes = []
     test_paths_set = set(test_paths)
     result_pickle_file_paths = []
     for i, test_split in enumerate(pytest_splits, start=1):
         result_pickle_file_path = get_run_tmp_file(Path(f"tracer_results_file_{i}.pkl"))
         result_pickle_file_paths.append(result_pickle_file_path)
-        args_dict["result_pickle_file_path"] = str(result_pickle_file_path)
+        # Build per-process args without mutating shared dict
+        per_args = dict(args_dict)
+        per_args["result_pickle_file_path"] = str(result_pickle_file_path)
         outpath = parsed_args.outfile
-        outpath = outpath.parent / f"{outpath.stem}_{i}{outpath.suffix}"
-        args_dict["output"] = str(outpath)
+        if outpath is not None:
+            outpath = outpath.parent / f"{outpath.stem}_{i}{outpath.suffix}"
+            per_args["output"] = str(outpath)
+        else:
+            per_args["output"] = None
         updated_sys_argv = []
         for elem in sys.argv:
             if elem in test_paths_set:
                 updated_sys_argv.extend(test_split)
             else:
                 updated_sys_argv.append(elem)
-        args_dict["command"] = " ".join(updated_sys_argv)
+        per_args["command"] = " ".join(updated_sys_argv)
+        args_json = json.dumps(per_args)
         processes.append(
             subprocess.Popen(
                 [
                     SAFE_SYS_EXECUTABLE,
                     Path(__file__).parent / "tracing" / "tracing_new_process.py",
                     *updated_sys_argv,
-                    json.dumps(args_dict),
+                    args_json,
                 ],
                 cwd=Path.cwd(),
             )
         )
     for process in processes:
         process.wait()
Suggestion importance[1-10]: 7

__

Why: Mutating a single args_dict inside the loop can cause incorrect per-process parameters; building a per-process copy is safer and handling outfile is None avoids an attribute error. The change is contextually correct and improves robustness, though not a critical bug fix.

Medium
Fix inconsistent return shapes

Ensure return types match the annotated tuple: always return a list of lists for
splits. Wrap test_files in a list when not splitting, and avoid returning [[]] which
signals "one empty split" and may be misinterpreted upstream.

codeflash/tracing/pytest_parallelization.py [49-65]

 if not test_files:
-    return [[]], None
+    return None, None
 ...
 if max_possible_splits == 0:
-    return test_files, test_paths
+    return [list(test_files)], test_paths
Suggestion importance[1-10]: 5

__

Why: The function's annotated return suggests consistent shapes; returning [list(test_files)] when not splitting is reasonable. However, changing if not test_files to return (None, None) alters behavior and may not be strictly necessary, so impact is moderate.

Low
General
Validate per-split replay outputs

Handle missing or None replay_test_file_path per split to avoid appending "None" and
failing later. Collect only valid paths and report which splits failed instead of
exiting all on one failure.

codeflash/tracer.py [140-149]

-for result_pickle_file_path in result_pickle_file_paths:
+for idx, result_pickle_file_path in enumerate(result_pickle_file_paths, start=1):
     try:
         with result_pickle_file_path.open(mode="rb") as f:
             data = pickle.load(f)
-            replay_test_paths.append(str(data["replay_test_file_path"]))
-    except Exception:
-        console.print("❌ Failed to trace. Exiting...")
-        sys.exit(1)
+            rtp = data.get("replay_test_file_path")
+            if rtp:
+                replay_test_paths.append(str(rtp))
+            else:
+                console.print(f"⚠️ No replay test produced for split {idx}.")
+    except Exception as e:
+        console.print(f"❌ Failed to trace split {idx}: {e}")
     finally:
         result_pickle_file_path.unlink(missing_ok=True)
Suggestion importance[1-10]: 6

__

Why: Filtering out missing replay_test_file_path entries prevents appending "None" and provides clearer diagnostics per split. It's accurate and helpful for resilience but not critical.

Low

Previous suggestions

Suggestions up to commit 936fa0a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent duplicate test arguments

After extending updated_sys_argv with a split, set added_paths = True so you don’t
repeatedly inject the same tests on subsequent matches. This prevents duplicated
test arguments in each subprocess call.

codeflash/tracer.py [119-126]

 added_paths = False
 updated_sys_argv = []
 for elem in sys.argv:
-    if elem in test_paths_set:
-        if not added_paths:
-            updated_sys_argv.extend(test_split)
+    if elem in test_paths_set and not added_paths:
+        updated_sys_argv.extend(test_split)
+        added_paths = True
     else:
         updated_sys_argv.append(elem)
Suggestion importance[1-10]: 8

__

Why: Setting added_paths = True prevents repeatedly injecting test_split into updated_sys_argv, fixing duplicate test arguments in subprocess calls.

Medium
General
Rebuild pytest args from unknown_args

Instead of iterating over the full sys.argv, rebuild the pytest invocation from
unknown_args to ensure only the intended test paths and flags are included. This
simplifies argument handling and avoids mixing tracer flags with pytest flags.

codeflash/tracer.py [121-126]

-for elem in sys.argv:
-    if elem in test_paths_set:
-        if not added_paths:
-            updated_sys_argv.extend(test_split)
-    else:
-        updated_sys_argv.append(elem)
+# Rebuild pytest command with split tests
+base_pytest = unknown_args[0]
+other_args = [arg for arg in unknown_args[1:] if arg not in test_paths_set]
+updated_sys_argv = [base_pytest, *other_args, *test_split]
Suggestion importance[1-10]: 4

__

Why: Rebuilding from unknown_args could simplify argument handling, but it risks dropping tracer-specific flags and may not cover all edge cases.

Low

@aseembits93 aseembits93 marked this pull request as ready for review September 6, 2025 00:09
Copy link

github-actions bot commented Sep 6, 2025

Persistent review updated to latest commit 060a78c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants